Quotation Extraction for Portuguese

نویسندگان

  • William Paulo Ducca Fernandes
  • Eduardo Motta
  • Ruy Luiz Milidiú
چکیده

Quotation extraction consists of identifying quotations and their authors. In this work, we present a Quotation Extraction system for Portuguese that is based on Entropy Guided Transformation Learning, a supervised Machine Learning algorithm. This is the first system that uses a Machine Learning approach for Portuguese. In order to train and evaluate the proposed system, we build the GLOBOQUOTES corpus, with news extracted from the GLOBO.COM portal. Our system obtains an Fβ=1 score of 79.02% for the subtask of associating a quotation to its author. For the whole Quotation Extraction task, the observed Fβ=1 score value is 66.03%. These findings indicate that the overall extraction quality is highly dependant on the quotation identification subtask.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QUEMDISSE? Reported speech in Portuguese

This paper presents some work on direct and indirect speech in Portuguese using corpus-based methods: we report on a study whose aim was to identify (i) Portuguese verbs used to introduce reported speech and (ii) syntactic patterns used to convey reported speech, in order to enhance the performance of a quotation extraction system, dubbed QUEMDISSE?. In addition, (iii) we present a Portuguese c...

متن کامل

A Lexicon of French Quotation Verbs for Automatic Quotation Extraction

Quotation extraction is an important information extraction task, especially when dealing with news wires. Quotations can be found in various configurations. In this paper, we focus on direct quotations introduced by a parenthetical clause, headed by a “quotation verb”. Our study is based on a large French news wire corpus from the Agence France-Presse. We introduce and motivate an analysis at ...

متن کامل

Extraction of Unmarked Quotations in Newspapers A Study Based on Direct Speech Extraction Systems

This paper presents work in progress to automatically extract quotation sentences from newspaper articles. The focus is the extraction and annotation of unmarked quotation sentences. A linguistic study shows that unmarked quotation sentences can be formalised into 16 patterns that can be used to develop an extraction grammar. The question of unmarked quotation boundaries identification is also ...

متن کامل

Automatically Detecting and Attributing Indirect Quotations

Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation ex...

متن کامل

An Apparel Trade quotation Architecture Based on BPM and SOA

Based on the analysis of problems and difficulties in apparel quotation system, this paper puts forward the combination of BPM and SOA as a new idea for analysis of apparel quotation system, according to their advantages in business goals and requirements analysis, and the corresponding services’ definition, extraction, optimization and integration. Through the combination, system flexibility, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011